optimize internlm xcomposer2 performance #11550

MeouSker77 · 2024-07-10T07:18:16Z

Description

optimize internlm xcomposer2 performance

1. Why the change?

2. User API changes

use following code to load model to get best performance (sym_int8 for better accuracy, sym_int4 for better speed)

from ipex_llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True,
                                             load_in_low_bit="sym_int8", modules_to_not_convert=["vit"])
model = model.half()
model = model.eval()

a complete example

import time
import torch

from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

ckpt_path = "internlm-xcomposer2-vl-7b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(ckpt_path, trust_remote_code=True,
                                             load_in_low_bit="sym_int8", modules_to_not_convert=["vit"])
model = model.half()
model = model.eval()

print(model)

query = '<ImageHere>Please describe this image in detail.'
image = 'image1.webp'

model = model.to('xpu')

with torch.inference_mode():
    for i in range(3):
        st = time.time()
        response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False, max_new_tokens=1)
        et = time.time()
        print(response)
        print(et - st)

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

MeouSker77 · 2024-07-10T07:25:02Z

PR validation: https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/9869875195

optimize internlm xcomposer2 performance

ad14400

MeouSker77 requested a review from rnwang04 July 10, 2024 07:27

rnwang04 approved these changes Jul 10, 2024

View reviewed changes

MeouSker77 merged commit 82f9514 into intel-analytics:main Jul 10, 2024
1 check passed

MeouSker77 deleted the optimize-internlm-xcompossor branch July 10, 2024 07:57

RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024

optimize internlm xcomposer2 performance (intel-analytics#11550)

fa00d26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize internlm xcomposer2 performance #11550

optimize internlm xcomposer2 performance #11550

MeouSker77 commented Jul 10, 2024 •

edited

Loading

MeouSker77 commented Jul 10, 2024

optimize internlm xcomposer2 performance #11550

optimize internlm xcomposer2 performance #11550

Conversation

MeouSker77 commented Jul 10, 2024 • edited Loading

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

MeouSker77 commented Jul 10, 2024

MeouSker77 commented Jul 10, 2024 •

edited

Loading